{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Generate Dalle Images With Conversable Agents"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "This notebook illustrates how to add the image generation capability to a conversable agent. \n",
    "\n",
    "````{=mdx}\n",
    ":::info Requirements\n",
    "Some extra dependencies are needed for this notebook, which can be installed via pip:\n",
    "\n",
    "```bash\n",
    "pip install ag2[openai,lmm]\n",
    "```\n",
    "\n",
    "For more information, please refer to the [installation guide](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/installing-ag2).\n",
    ":::\n",
    "````\n",
    "First, let's import all the required modules to run this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "from IPython.display import display\n",
    "from PIL.Image import Image\n",
    "\n",
    "import autogen\n",
    "from autogen.agentchat.contrib import img_utils\n",
    "from autogen.agentchat.contrib.capabilities import generate_images"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "Let's define our LLM configs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "gpt_config = {\n",
    "    \"config_list\": [{\"model\": \"gpt-4-turbo-preview\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}],\n",
    "    \"timeout\": 120,\n",
    "    \"temperature\": 0.7,\n",
    "}\n",
    "gpt_vision_config = {\n",
    "    \"config_list\": [{\"model\": \"gpt-4-vision-preview\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}],\n",
    "    \"timeout\": 120,\n",
    "    \"temperature\": 0.7,\n",
    "}\n",
    "dalle_config = {\n",
    "    \"config_list\": [{\"model\": \"dall-e-3\", \"api_key\": os.environ[\"OPENAI_API_KEY\"]}],\n",
    "    \"timeout\": 120,\n",
    "    \"temperature\": 0.7,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "````{=mdx}\n",
    ":::tip\n",
    "Learn more about configuring LLMs for agents [here](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/llm-configuration).\n",
    ":::\n",
    "````\n",
    "\n",
    "Our system will consist of 2 main agents:\n",
    "1. Image generator agent.\n",
    "2. Critic agent.\n",
    "\n",
    "The image generator agent will carry a conversation with the critic, and generate images based on the critic's requests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "CRITIC_SYSTEM_MESSAGE = \"\"\"You need to improve the prompt of the figures you saw.\n",
    "How to create an image that is better in terms of color, shape, text (clarity), and other things.\n",
    "Reply with the following format:\n",
    "\n",
    "CRITICS: the image needs to improve...\n",
    "PROMPT: here is the updated prompt!\n",
    "\n",
    "If you have no critique or a prompt, just say TERMINATE\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "def _is_termination_message(msg) -> bool:\n",
    "    # Detects if we should terminate the conversation\n",
    "    if isinstance(msg.get(\"content\"), str):\n",
    "        return msg[\"content\"].rstrip().endswith(\"TERMINATE\")\n",
    "    elif isinstance(msg.get(\"content\"), list):\n",
    "        for content in msg[\"content\"]:\n",
    "            if isinstance(content, dict) and \"text\" in content:\n",
    "                return content[\"text\"].rstrip().endswith(\"TERMINATE\")\n",
    "    return False\n",
    "\n",
    "\n",
    "def critic_agent() -> autogen.ConversableAgent:\n",
    "    return autogen.ConversableAgent(\n",
    "        name=\"critic\",\n",
    "        llm_config=gpt_vision_config,\n",
    "        system_message=CRITIC_SYSTEM_MESSAGE,\n",
    "        max_consecutive_auto_reply=3,\n",
    "        human_input_mode=\"NEVER\",\n",
    "        is_termination_msg=lambda msg: _is_termination_message(msg),\n",
    "    )\n",
    "\n",
    "\n",
    "def image_generator_agent() -> autogen.ConversableAgent:\n",
    "    # Create the agent\n",
    "    agent = autogen.ConversableAgent(\n",
    "        name=\"dalle\",\n",
    "        llm_config=gpt_vision_config,\n",
    "        max_consecutive_auto_reply=3,\n",
    "        human_input_mode=\"NEVER\",\n",
    "        is_termination_msg=lambda msg: _is_termination_message(msg),\n",
    "    )\n",
    "\n",
    "    # Add image generation ability to the agent\n",
    "    dalle_gen = generate_images.DalleImageGenerator(llm_config=dalle_config)\n",
    "    image_gen_capability = generate_images.ImageGeneration(\n",
    "        image_generator=dalle_gen, text_analyzer_llm_config=gpt_config\n",
    "    )\n",
    "\n",
    "    image_gen_capability.add_to_agent(agent)\n",
    "    return agent"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "We'll define `extract_img` to help us extract the image generated by the image generator agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_images(sender: autogen.ConversableAgent, recipient: autogen.ConversableAgent) -> Image:\n",
    "    images = []\n",
    "    all_messages = sender.chat_messages[recipient]\n",
    "\n",
    "    for message in reversed(all_messages):\n",
    "        # The GPT-4V format, where the content is an array of data\n",
    "        contents = message.get(\"content\", [])\n",
    "        for content in contents:\n",
    "            if isinstance(content, str):\n",
    "                continue\n",
    "            if content.get(\"type\", \"\") == \"image_url\":\n",
    "                img_data = content[\"image_url\"][\"url\"]\n",
    "                images.append(img_utils.get_pil_image(img_data))\n",
    "\n",
    "    if not images:\n",
    "        raise ValueError(\"No image data found in messages.\")\n",
    "\n",
    "    return images"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "Start the conversation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "dalle = image_generator_agent()\n",
    "critic = critic_agent()\n",
    "\n",
    "img_prompt = \"A happy dog wearing a shirt saying 'I Love AG2'. Make sure the text is clear.\"\n",
    "# img_prompt = \"Ask me how I'm doing\"\n",
    "\n",
    "result = dalle.initiate_chat(critic, message=img_prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "Let's display all the images that was generated by Dalle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "images = extract_images(dalle, critic)\n",
    "\n",
    "for image in reversed(images):\n",
    "    display(image.resize((300, 300)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Generate images with conversable agents.",
   "tags": [
    "capability",
    "multimodal"
   ]
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
